Audio Classification Using Class-Specific Learned Descriptors
نویسندگان
چکیده
This paper presents a classification scheme for audio signals using high-level feature descriptors. The descriptor is designed to capture the relevance of each acoustic feature group (or feature set like mel-frequency cepstral coefficients, perceptual features etc.) in recognizing an audio class. For this, a bank of RVM classifiers are modeled for each ‘audio class’-‘feature group’ pair. The response of an input signal to this bank of RVM classifiers forms the entries of the descriptor. Each entry of the descriptor thus measures the proximity of the input signal to an audio class based on a single feature group. This form of signal representation offers two-fold advantages. First, it helps to determine the effectiveness of each feature group in classifying a specific audio class. Second, the descriptor offers higher discriminability than the low-level feature groups and a simple SVM classifier trained on the descriptor produces better performance than several state-of-the-art methods.
منابع مشابه
Learning Sparse Adversarial Dictionaries For Multi-Class Audio Classification
Audio events are quite often overlapping in nature, and more prone to noise than visual signals. There has been increasing evidence for the superior performance of representations learned using sparse dictionaries for applications like audio denoising and speech enhancement. This paper concentrates on modifying the traditional reconstructive dictionary learning algorithms, by incorporating a di...
متن کاملSound• Classification and Similarity
The sound classification and indexing tools are organized into low-level descriptors (LLD), AudioSpectrumBasis and AudioSpectrumProjection, and high-level description schemes (DSs), SoundModel and SoundClassificationModel, which are based on the ContinuousHiddenMarkovModel and ProbabilityClassificationModel DSs defined in the Multimedia Description Schemes (MDS) document. The tools provide for ...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملRepresenting nonspeech audio signals through speech classification models
The human auditory system is very well matched to both human speech and environmental sounds. Therefore, the question arises whether human speech material may provide useful information for training systems for analyzing nonspeech audio signals, such as in a recognition task. To find out how similar nonspeech signals are to speech, we measure the closeness between target nonspeech signals and d...
متن کاملComparing audio descriptors for singing voice detection in music audio files
Given the relevance of the singing voice in popular western music, a system able to reliable identify those portions of a music audio file containing vocals would be very useful. In this work, we explore already used descriptors to perform this task and compare the performance of a statistical classifier using each kind of them, concluding that MFCC are the most appropriate. As an outcome of ou...
متن کامل